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METHODS AND CIRCUITS FOR MEASURING CLOCK SKEW ON 
PROGRAMMABLE LOGIC DEVICES 

Siuki Chan 

FIELD OF THE INVENTION 

[0001] This invention relates generally to methods and 
circuit configurations for measuring signal skew in 
programmable logic devices. 

BACKGROUND 

[0002] A programmable logic device (PLD) is a well-known 
type of digital integrated circuit that may be programmed 
by a user (e.g., a circuit designer) to perform specified 
logic functions. One type of PLD, the field-programmable 
gate array (FPGA) , typically includes an array of 
configurable logic blocks (CLBs) that are programmably 
interconnected to each other and to programmable 
input/output blocks (IOBs) . This collection of 
configurable logic is personalized by loading configuration 
data into internal configuration memory cells that define 
how the CLBs, interconnections, and IOBs are configured. 
For a detailed discussion of an exemplary FPGA, see U.S. 
Patent No. 6,144,22 0 entitled "FPGA Architecture Using 
Multiplexers that Incorporate a Logic Gate, " by Steven P. 
Young, which is incorporated herein by reference. 
[0003] Figure 1 (Prior Art) depicts a conventional FPGA 
100, examples of which include the Spartan™ and Virtex™ 
FPGAs available from Xilinx, Inc., of San Jose, California. 
FPGA 100 includes an array of programmably interconnected 
CLBs 105. FPGA 100 additionally includes a clock 
distribution network 110 that can be connected to internal 
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or external clock sources via a global clock buffer BUFG. 
Many other FPGA resources are omitted from Figure 1 for 
brevity. 

[0004] Manufacturers of PLDs, including FPGAs , would 
like to guarantee the highest speed performance possible 
without devices failing to meet the guaranteed timing 
specifications. PLD designers therefore measure circuit 
timing as accurately as possible to minimize the guard 
bands required to ensure correct device performance. U.S. 
Patent No. 6,144,262 entitled 'Circuit for Measuring Signal 
Delays of Asynchronous Register Inputs," by Christopher 
Kingsley describes circuits and methods of measuring 
circuit timing in programmable logic devices, and is 
incorporated herein by reference. U.S. Patent No. 
5,795,068 entitled "Method and Apparatus for Measuring 
Localized Temperatures and Voltages on Integrated 
Circuits," by Robert 0. Conn describes ring oscillator 
configurations on FPGAs, and is also incorporated herein by 
reference . 

[0005] Clock distribution network 110 includes a source 
spine 110S that conveys clock signals to a source node 112 
in the interior of FPGA 100. From there, a horizontal 
spine 110H conveys clock signals to a number of vertical 
clock spines 110V. Finally, a number of clock destination 
branches HOD extend to each CLB 105. Clock distribution 
network 110 can be programmably connected to any of CLBs 
105 via programmable interconnect points. The above-cited 
Young patent describes exemplary programming technologies. 
[0006] Clock distribution network 110 typically includes 
clock buffers 115 placed and sized to minimize clock skew, 
where skew is defined as the difference in path delays from 
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clock input GCLK to each of CLBs 105 and any other clock 
loads, such as embedded' blocks of memory and IOBs. 
[0007] Clock distribution network 110 is designed to 
minimized clock skew, and so the delays inherent in network 
110 are short relative to the delays associated with other 
FPGA resources . The short skew is beneficial from the 
standpoint of performance, but renders difficult the task 
of accurately determining clock skew because the test 
circuitry normally introduces more skew than the clock 
network. There is therefore a need for a more accurate 
means of measuring clock skew on programmable logic 
devices . 

S SUMMARY 

fU [0008] The present invention is directed to a method for 

jg accurately measuring the skew of clock distribution 

!§ networks on programmable logic devices. Individual clock 

s' distribution networks are modeled using a sequence of 

m delay-element configurations formed on the device using 

W configurable logic. Each delay element includes a portion 

g 

p of the clock network for which skew is of interest, and 

consequently exhibits a delay that depends, in part, on the 
skew imposed by the portion of interest. The delay through 
each delay element is measured by incorporating the delay 
element into ring oscillators and measuring the resulting 
period. 

[0009] The various delay-element configurations are 
modeled mathematically as the sum of a series of delays. 
The delay-element configurations are designed so their 
respective equations can be combined to solve for the delay 
contribution, or skew, of the portion of the clock network 
for which skew is to be measured. The delay associated 
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with the portion of interest can then be combined with skew 
measurements for other portions of the clock network to 
more completely describe the network. 

[0010] The claims, and not this summary, define the 
scope of the invention. 

RP.TF.F DESCRIPTION OF THE FIGURES 

[0011] Figure 1 (Prior Art) depicts a conventional FPGA 
100. 

[0012] Figure 2A depicts an FPGA oscillator 
configuration 200. 

[0013] Figure 2B depicts an FPGA oscillator 
h* configuration 250. 

S [0014] Figures 3A and 3B depict respective FPGA 
H oscillator configurations 300 and 3 50. 

J [0015] Figures 4A, 4B, 4C, and 4D depict respective 

£ oscillator configurations 400, 420, 440, and 460. 

w 

2 [0016] Figure 5 depicts an oscillator configuration 500 

q implemented on an FPGA that includes one or more blocks of 

\ - II 

5JJ dedicated user memory. 

H>. DETAILED DESCRIPTION 

[0017] Figures 2A-4D schematically depict FPGA 
configurations used in accordance with embodiments of the 
invention to accurately measure global clock skew for clock 
distribution network 110 of Figure 1. In the examples, the 
FPGA is a Virtex™ XCV1000 FPGA, available from Xilinx, 
Inc., which includes an array of 96 columns and 64 rows of 
CLBs, or a total of 6,144 CLBs . The number of CLBs and 
other FPGA resources shown in the figures is limited for 
brevity. 



4 



X-884 US 



PATENT 



[0018] Figure 2A depicts an FPGA oscillator 
configuration 200 in which a CLB R24C36 (for row 24, column 
36), a CLB R24C37, and a feedback circuit 205 are 
interconnected to form a ring oscillator. Circuit 205 and 
the associated connections -- made up of available FPGA 
resources -- connect to clock distribution network 110 via 
global clock buffer BUFG. The resources interconnected as 
shown using dashed and bold interconnect and clock lines 
form a ring oscillator. In the depicted embodiment, CLB 
R24C3 6 is configured to be synchronous and non- inverting, 
though different synchronous or asynchronous configurations 
might also be used. 

[0019] The FPGA is programmed (i.e., configured) so the 
global clock buffer BUFG connects to the clock input 
terminal of CLB R2 4C3 6 via source spine 110S, horizontal 
clock spine 110H, a vertical clock spine Si (part of a 
spine 110V) , and one of destination branches HOD. The 
synchronous output terminal of CLB R24C3 6 is programmably 
connected to an input terminal of CLB R24C37 via some 
programmable interconnect resources depicted as net 
C36->C37. Finally, an output terminal of CLB R24C37 is 
programmably connected to the input terminal of global 
buffer BUFG via programmable interconnect resources 215 and 
220 and circuit 205. As oscillator configuration 200 
oscillates, the oscillation period T 20 o provides a measure 
of the speed of the interconnected components. For 
example, if the average period T 20 o is ten nanoseconds, then 
the average time required for positive- and negative-going 
signal transitions to traverse the components in the ring 
is ten nanoseconds. In the depicted embodiment, CLB R24C37 
is connected in the ring via asynchronous input and output 
terminals, though synchronous input and output terminals 
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may be used in other embodiments. The above- incorporated 
Kingsley patent describes some oscillators for use with the 
present invention; other embodiment will be readily 
apparent to those of skill in the art. 

[0020] The delay around oscillator 200 is the sum of the 
delays associated with horizontal spine 110H, vertical 
spine SI, a twelve -column- long (12C) portion of a 
destination branch HOD, the clock-to-out (Clk->Out) delay 
of CLB R24C3 6, the interconnect delay of net C3 6->C37, and 
the combined delays K of the delay through CLB R24C37, 
connections 215 and 220, circuit 205, buffer BUFG, and 
source spine 110S. The analysis can be simplified by 
assuming nearby CLBs exhibit identical clock-to-out 
(Clk->Out) delays. This is a reasonable assumption, 
particularly for identical components formed in close 
proximity. 

[0021] Stated mathematically, the oscillation period T 20 o 
of oscillator configuration 200 is: 

T 20 o = VSK + VS1 + 12C + Clk->Out + C36->C37 + K (1) 

[0022] where VSK is the skew between spines Si and S2, 
VS1 is delay component contributed by a portion of vertical 
spine SI, 12C is the delay associated with a 12-column~long 
portion of branch HOD, Clk->Out is the clock-to-out delay 
of a CLB, C36~>C37 is the delay encountered by signals 
traveling left-to-right from column 36 to column 37, and K 
is the delay associated with that portion of oscillator 
configuration 200 depicted using dashed lines. 
[0023] The oscillation period T 20 o of configuration 200 
is generally not, by itself, enough information to 
determine the delay associated with any one of the 
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components of the ring. The FPGA is therefore reconfigured 
to form one or more additional test structures. 
[0024] Figure 2B depicts an FPGA configuration 250 in 
which a pair of CLBs R24C37 and R2438, global clock buffer 
BUFG, and the identical circuit 2 05 of Figure 2A are 
interconnected to form a second ring oscillator. CLB 
R24C37, circuit 2 05, clock buffer BUFG, and the dashed 
portion of clock distribution network 110 and interconnect 
resources 215 and 220 are identical to the like-identified 
structures of Figure 2A; consequently, the sum of the 
combined delay contributions of those elements, W K" in 
equation 1, is identical in oscillator configurations 200 
and 250. The portions of the oscillators depicted as 
connected via solid line in the figures can be considered 
delay elements for which the difference in signal 
propagation delays provides a measure of clock skew. 
Including the delay elements in ring oscillators allows for 
accurate measures of propagation delay through the delay 
elements . 

[0025] In Figure 2B, the FPGA is programmed so the clock 
input terminal of CLB R24C3 8 connects to the output 
terminal of global clock buffer BUFG via an 11-column long 
portion of one of destination branches HOD, vertical spine 
S2, and source spine 110S. The synchronous output terminal 
of CLB R24C3 8 is programmably connected to an asynchronous 
input terminal of CLB R24C37 via some programmable 
interconnect resources depicted as net C37<-C38. Finally, 
as in configuration 200, an output terminal of CLB R24C37 
is programmably connected to the input terminal of global 
buffer BUFG via programmable interconnect resources 215 and 
220 and circuit 205. 
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[0026] Stated mathematically, the oscillation period T 25 o 
of oscillator configuration 250 is: 

T 25 o = VS2 + 11C + Clk->Out + C37<-C38 + K (2) 

[0027] where VS2 is the delay of a portion of vertical 
spine 110V, 11C is the delay associated with an 11-column- 
long portion of a branch HOD, Clk->Out is the clock-to-out 
delay of CLB R24C38, C37<-C38 is the delay encountered by 
signals traveling right- to-left from column 3 8 to column 
37, and K is the delay associated with that portion of 
oscillator configuration 250 depicted using dashed lines, 
including the delay through CLB R24C37. 
[0028] Comparing periods T 20 o and T 25 o of respective 
configurations 2 00 and 250 provides a measure of the skew 
VSK between verticals spines Si and S2 . Subtracting 
equation 2 from equation 1 gives: 

T 2 oo - T 250 = (VSK + VS1 + 12C + Clk->Out + C36->C37 + K) - 
(VS2 + 11C + Clk->Out + C37<-C38 + K) 

= VSK + VS1 - VS2 + C + C36->C37 - C37<-C38 (3) 

[0029] Delays VS1 and VS2 are from identical or nearly 
identical resources, and therefore can be presumed to be 
equal (i.e., VSl = VS2) . Thus, 

T 200 - T 250 = VSK + C + C36->C37 - C37<-C38 (4) 

[0030] Different programmable logic devices route 
differently. For a given PLD, the values of C36->C37 and 
C37<-C3 8 may be close enough to assume they cancel one 
another. Moreover, the contribution of one-column-width of 
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clock delay C can either be ignored or estimated by 
simulation. Such assumptions reduce the number of 
measurements used to find VSK, the skew between vertical 
spines Si and S2 . For example, assuming C36->C37 = 
C37<-C38 simplifies equation 4 to: 

VSK = T 2 oo - T 25 o - C (5) 

[0031] Skew VSK can thus be approximated using periods 
T200 and T250. 

[0032] It may be difficult or impossible to route some 
PLDs such that the left-to-right connections (e.g., 
C36->C37) provide the same delays as the right-to-left 
connections (e.g., C37<-C38) . In such cases, equation 4 
cannot be simplified to equation 5. The contribution of 
one-column-width of clock delay C might also be of 
interest. Such cases may warrant additional measurements. 
[0033] Figures 3A and 3B depict respective oscillator 
configurations 300 and 350, the periods of which provide 
additional data for finding the skew between spines Si and 
S2. As with the preceding figures, the dashed and bold 
lines indicate which components form the oscillators. The 
dashed lines and components are identical circuit 
configurations in both Figures 3A and 3B, and their 
equivalent delay contributions are symbolized by a constant 
M. 

[0034] Using the same method described above for 
determining the periods associated with oscillator 
configurations 2 00 and 250, the respective periods T 30 o and 
T 350 of oscillator configurations 3 00 and 350 are: 

T 300 = VSK + VS1 + 11C + Clk->Out + C35->C36 + M (6) 

and 
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T 350 = VS2 + 12C + Clk->Out + C36<-C37 + M (7) 

Recalling that VSl = VS2 and subtracting equation 7 from 
equation 6 gives: 

T300 - T 350 = (VSl + VSK + 11C + Clk->Out + C35->C36 + M) - 
(VS2 + 12C + Clk->Out + C36<-C37 + M) 

= VSK - C + C35->C36 - C36<-C37 (8) 

[0035] Some PLDs, including Virtex™ FPGAs , can be 
configured so all of the left-to-right connections are 
identical (e.g., C35->C36 = C36->C37) and all right-to-left 
connections are identical (e.g., C35<-C36 = C36<-C37) . 
Thus configured, the contributions of like connections can 
be combined for simplicity. Letting DLR = delay left-to- 
right and DRL = delay right-to-left and adding equations 4 
and 8 gives : 

T200 ~ T 25 o + T300 " T 350 = 2 (VSK + DLR - DRL) (9) 

or 

VSK - (T200 " T 250 + T300 - T 350 )/2 + DRL - DLR (10) 

where VSK is the skew between vertical spines Si and S2 , 
the parameter of interest. 

[0036] As can be seen in equation 9, the data obtained 
using oscillator configurations 300 and 350 cancelled the 
contribution of one-column-width of delay C from the 
resulting skew measurement, leaving skew VSK dependent only 
upon the measured periods and the difference in delays 
between the left-to-right and right-to-left connections 
between adjacent CLBs . 
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[0037] Figures 4A, 4B, 4C, and 4D depict oscillator 
configurations 400, 42 0, 440, and 460, the periods of which 
provide additional data for finding the difference between 
the left-to-right delay DLR and the right-to-lef t delay 
DRL. This difference DLR-DRL can then be used with 
equation 10 to calculate the skew VSK between vertical 
clock spines Si and S2 . As with the preceding figures, the 
dashed and bold lines indicated which components form the 
oscillators. The dashed lines and components are identical 
circuit configurations in each of Figures 4A-4D, and a 
constant N symbolizes their delay contributions. 
[0038] Using the same method described above for 
determining the periods associated with oscillator 
configurations 2 00 and 250, the respective periods T 4 oo, 
T420/ T 440/ and T 4 6o of oscillator configurations 400, 420, 
440, and 460 are: 
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Subtracting equation 12 from equation 11 gives: 

T400 ~ T420 = DLR - DRL - 2C (15) 

Subtracting equation 14 from equation 13 gives: 

T 440 - T 46 o = DLR - DRL + 2C (16) 

Adding equations 15 and 16 gives: 

T400 " T 420 + T440 - T 46 o = 2 (DLR - DRL) 

or 

DLR - DRL = (T400 " T420 + T440 " T 450 )/2 (17) 
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Multiplying equation 17 by -1 gives: 

DRL - DLR = (T 420 " T400 + T 45 o " T 44 o)/2 (18) 

[0039] Oscillator configurations 400, 420, 440, and 460 
of Figures 4A-4D thus collectively provide a measure of the 
difference in delay between right-to-left and left-to-right 
programmable interconnections between adjacent CLBs . The 
resulting time measurement DRL - DLR can be used to solve 
for skew VSK using equation 10 as follows: 

VSK = (T 200 - T 250 + T 30 o - T 35 o)/2 + 

(T420 ~ T400 + T 460 - T 44 o)/2 (19) 

or 

VSK - (T 2 oo-T250+T 3 oo-T350+T420-T4oo+T 46 o-T44o) /2 (20) 

[0040] Thus, the eight oscillator configurations 
depicted in Figures 2A-4D collectively provide the 
information required for an accurate measurement of the 
skew VSK between vertical clock spines Si and S2 . 
[0041] Figure 5 depicts an oscillator configuration 500 
implemented on an FPGA, such as a Virtex™E device, that 
includes one or more blocks of dedicated user memory. For 
a detailed description of Virtex™E FPGAs, see the Xilinx 
Advance Product Specification, DS022 (vl.3) February 28, 
2 000, pages 3-1 to 3-74, which is incorporated herein by 
reference . 

[0042] In the FPGA of Figure 5, an embedded memory block 
505 is arranged in a column extending between CLB column 36 
and CLB column 37. The skew-measurement methods described 
above can be used for the device of Figure 5, assuming the 
delay contribution of left-to-right connection 510 equals 
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delay contribution DLR for a one-column left-to-right 
connection plus some delay associated with traversing 
memory block 505 (i.e., DLR + Dm EM ) and the delay 
contribution of a right-to-left connection across memory 
block 510 (not shown) is delay contribution DRL for a one- 
column right-to-left connection plus the delay associated 
with traversing memory block 505 (i.e., DRL + D MEM ) . The 
foregoing methods can be used to measure clock skew because 
the delay contribution D ME m associated with memory 505 
cancels in the application of the foregoing equations so 
equation 2 0 still provides a valid measure of skew VSK. 
[0043] Referring back to Figure 1, the depicted FPGA 
includes five vertical clock spines 110V. The above- 
described methods can be used to find the skew between the 
first and second vertical clock spines, the second and 
third vertical clock spines, the third and fourth vertical 
clock spines, and the fourth and fifth vertical clock 
spines. The resulting collection of data can then be used 
to determine the skew between any two vertical clock 
spines . 

[0044] The foregoing methods measure skew between 
vertical clock spines in the depicted examples. Skew 
measurements between destination branches HOD may also be 
of interest, and can be combined with the above-described 
skew measurements to give a comprehensive skew analysis for 

an entire FPGA. Patent Application Serial No. 

entitled "METHODS AND CIRCUITS FOR MEASURING CLOCK SKEW ON 
PROGRAMMABLE LOGIC DEVICES," by Siuki Chan, filed herewith 
[docket number X-885] , describes methods of measuring skew 
between destination branches and is incorporated herein by 
reference . 
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[0045] FPGA components are connected in various ways: 
some components are directly connected, others are 
connected via intermediate components, such as buffers, and 
still others are programmably connectable, which is to say 
they can be programmably connected via programmable 
interconnect resources. In each instance, components are 
connected to establish some desired electrical 
communication between two or more circuit nodes, or 
terminals. Such communication may often be accomplished 
using a number of circuit configurations, as will be 
understood by those of skill in the art. 

[0046] While the present invention has been described in 
connection with specific embodiments, variations of these 
embodiments will be obvious to those of ordinary skill in 
the art. For example, multiple embodiments of the above- 
described oscillator configurations can be used 
simultaneously on devices that include more than one signal 
tree for which skew measurements are of interest. 
Moreover, above-described skew measurements can be done in 
any order, and other rows of CLBs (e.g., row 24 of Figures 
2A-5) could just as easily be used to perform skew 
measurements. Therefore, the spirit and scope of the 
appended claims should not be limited to the foregoing 
description. 
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